PDF Page Color Counter

Star Badge Open Source Love

PDF Page Color Counter

🛠️ Description

This Python project provides a simple yet powerful tool for analyzing PDF documents and counting the number of black and color pages. Whether you’re working on document analysis, quality control, or just curious about the composition of your PDF files, this code helps you gain insights into the document’s visual characteristics.

Key Features:

  • Easy Integration: With a few lines of code, you can integrate this functionality into your Python applications or workflows.

  • PDF Expertise: Utilizing the PyMuPDF (MuPDF) library, this project efficiently processes PDF files, making it suitable for a wide range of applications.

  • Color Page Detection: It accurately identifies color and black & white pages within the PDF document, providing valuable statistics.

  • Use Cases: This code can be employed in various scenarios, such as document archiving, printing optimization, or content analysis.

⚙️ Languages or Frameworks Used

  • Python: The primary programming language used for the project.
  • FastAPI: A modern, fast (high-performance) web framework for building APIs with Python.
  • PyMuPDF (MuPDF): A lightweight and efficient PDF processing library for Python.
  • OpenCV: Used for image analysis and processing.
  • Pillow (PIL): Python Imaging Library for working with images.

🌟 How to run

  • Install all the requirements

    Run pip install -r requirements.txt to install all the requirements.

  • Setup a Virtual Enviroment

    • Run this command in your terminal python -m venv myenv.
    • Change your directory by cd myenv/Scripts if on windows.
    • Activate the virtual enviroment by running this command source activate.
    • Move out from virtual env to your Project Directory by cd.. .
    • Install the packages if not present - uvicorn, fastapi, fitz, frontend, tools, opencv-python, pillow, python-multipart, PyMuPDF.
    pip install uvicorn fastapi fitz frontend tools opencv-python pillow python-multipart PyMuPDF
  • Now Just, Run the project

    -Now Run the following command - uvicorn main:app --reload. -Open the localhost link on your browser and put /docs at your endpoint to see the fastapi docs UI. Screenshot 2023-10-25 134746

    -Now, Click on POST and then Try it out. -Click on Choose file to select a pdf, which you want to count the number of black and color pages. -Click on Execute.

📺 Demo

Screenshot 2023-10-25 133406

🤖 Author

Github - OM YADAV LinkedIn - OM YADAV

Source Code: main.py

from fastapi import FastAPI, UploadFile, File
import fitz
import cv2
from PIL import Image
import numpy as np
import os

app = FastAPI()

@app.post("/")
async def get_pdf(file : UploadFile = File(...)):
    #Initializing our variables.
    colored_page_count = 0
    color_list=[]
    black_list=[]
    num = 0
    black_count = 0
    #Getting the file name and then saving it in local.
    contents = await file.read()
    with open(file.filename, "wb") as f:
        f.write(contents)
    # Open the PDF file
    # Get the full path to the uploaded file
    file_path = os.path.join(os.getcwd(), file.filename)
    print(file_path)
    with fitz.open(file_path) as doc:
        print(doc)
        # Iterate through the pages
        for _, page in enumerate(doc):
            # Render the page to an image
            pix = page.get_pixmap(alpha=False)
            img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
            

            arr = np.array(img)
            arr_mean = cv2.mean(arr)
            if not (arr_mean[0] == arr_mean[1] == arr_mean[2]):
                colored_page_count += 1
                num += 1
                color_list.append(num)
                #print('colored', num)
            else:
                num += 1
                black_count += 1
                black_list.append(num)
                #print('Black', num)
        print("\nColored Pages: ",color_list,"\n")
        print("Black & White Pages: ",black_list)
        #Close the file
    os.remove(file_path)    
    return {"colored : ":colored_page_count,"Black Count : ":black_count}